{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lb5yiH5h8x3h"
   },
   "source": [
    "##### Copyright 2024 Google LLC."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "906e07f6e562"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License 2.0\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1YXR7Yn480fU"
   },
   "source": [
    "# 2D spatial reasoning with Gemini 2.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CeJyB7rG82ph"
   },
   "source": [
    "<table align=\"left\">\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fcolab.research.google.com%2Fgithub%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fprompting_recipes%2Fspatial_reasoning%2Fspatial_reasoning_SDK_for_gemini2.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fcolab%2Fimport%2Fhttps%3A%252F%252Fraw.githubusercontent.com%252FGoogleCloudPlatform%252Fapplied-ai-engineering-samples%252Fmain%252Fgenai-on-vertex-ai%252Fgemini%252Fprompting_recipes%252Fspatial_reasoning%252Fspatial_reasoning_SDK_for_gemini2.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fworkbench%2Fdeploy-notebook%3Fdownload_url%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fprompting_recipes%2Fspatial_reasoning%2Fspatial_reasoning_SDK_for_gemini2.ipynb\">\n",
    "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<div style=\"clear: both;\"></div>\n",
    "\n",
    "<b>Share to:</b>\n",
    "\n",
    "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\" target=\"_blank\">\n",
    "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
    "</a>\n",
    "\n",
    "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\" target=\"_blank\">\n",
    "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
    "</a>\n",
    "\n",
    "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\" target=\"_blank\">\n",
    "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg\" alt=\"X logo\">\n",
    "</a>\n",
    "\n",
    "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\" target=\"_blank\">\n",
    "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
    "</a>\n",
    "\n",
    "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/prompting_recipes/spatial_reasoning/spatial_reasoning_SDK_for_gemini2.ipynb\" target=\"_blank\">\n",
    "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "___eV40o8399"
   },
   "source": [
    "This notebook demonstrates object detection and spatial reasoning with Gemini 2.0 in Vertex AI.\n",
    "\n",
    "You'll learn how to use Gemini to perform object detection like this:\n",
    "<img src=\"https://storage.googleapis.com/generativeai-downloads/images/cupcakes_with_bbox.png\"/>\n",
    "\n",
    "You will find live examples including object detection with\n",
    "\n",
    "* overlaying information\n",
    "* searching within an image\n",
    "* translating and understanding things in multiple languages\n",
    "* using Gemini reasoning abilities\n",
    "\n",
    "**Please note**\n",
    "\n",
    "There's no \"magical prompt\". Feel free to experiment with different ones. You can use the samples included in this notebook or upload your own images and write your own prompts.\n",
    "\n",
    "This notebook is based on a [notebook](https://github.com/google-gemini/cookbook/blob/main/gemini-2/spatial_understanding.ipynb) published by the Gemini AI Studio team.\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u4dbK2FbXCJE"
   },
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7o6sel20XCJP"
   },
   "source": [
    "Please enter the PROJECT_ID of your Google Cloud Project\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "id": "iez9RaCRAegw"
   },
   "outputs": [],
   "source": [
    "PROJECT_ID = '[your-project-id]' # @param {type: 'string'}\n",
    "LOCATION = 'us-central1' # @param {type: 'string'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "mEQKQCkBAu_f"
   },
   "source": [
    "Install or upgrade Vertex AI SDK and restart the Colab runtime (if necessary)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "u53CdzCFXCJP",
    "outputId": "e40b3454-bfe7-42d5-82d2-32b3ed88d59b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "google-cloud-aiplatform is already installed with version 1.73.0.\n"
     ]
    }
   ],
   "source": [
    "def install_or_upgrade_vertex_ai():\n",
    "    package_name = \"google-cloud-aiplatform\"\n",
    "    required_version = \"1.73.0\"\n",
    "\n",
    "    try:\n",
    "        import google.cloud.aiplatform as aiplatform\n",
    "        installed_version = aiplatform.__version__\n",
    "\n",
    "        if installed_version < required_version:\n",
    "            print(f\"Upgrading {package_name} from version {installed_version} to {required_version}...\")\n",
    "            !pip install google-cloud-aiplatform --upgrade --quiet --user\n",
    "            print(f\"Successfully upgraded {package_name}.\")\n",
    "            restart_runtime()\n",
    "        else:\n",
    "            print(f\"{package_name} is already installed with version {installed_version}.\")\n",
    "    except ImportError:\n",
    "        print(f\"{package_name} is not installed. Installing version {required_version}...\")\n",
    "        !pip install google-cloud-aiplatform --upgrade --quiet --user\n",
    "        print(f\"Successfully installed {package_name}.\")\n",
    "        restart_runtime()\n",
    "\n",
    "def restart_runtime():\n",
    "  import IPython\n",
    "\n",
    "  print('The kernel is going to restart. In Colab or Colab Enterprise, you might see an error message that says \"Your session crashed for an unknown reason.\" This is expected.')\n",
    "  print('Once the runtime is restarted you can continue running individual cells or run the entire notebook with the \"Run all\" command.')\n",
    "  IPython.Application.instance().kernel.do_shutdown(True)\n",
    "\n",
    "install_or_upgrade_vertex_ai()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gsNLA8_YXCJQ"
   },
   "source": [
    "###  Authenticate to Google Cloud\n",
    "\n",
    "If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects). More authentication options are discussed [here](https://cloud.google.com/docs/authentication).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "TaPzmJ07XCJQ"
   },
   "outputs": [],
   "source": [
    "import base64\n",
    "import io\n",
    "import os\n",
    "import requests\n",
    "import sys\n",
    "from io import BytesIO\n",
    "from PIL import Image\n",
    "\n",
    "if 'google.colab' in sys.modules:\n",
    "    from google.colab import auth\n",
    "    auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3Hx_Gw9i0Yuv"
   },
   "source": [
    "### Initialize Vertex AI SDK client\n",
    "\n",
    "Please rerun this cell if you ever get a Session Timeout error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "HghvVpbU0Uap"
   },
   "outputs": [],
   "source": [
    "import vertexai\n",
    "from vertexai.generative_models import (GenerativeModel, HarmBlockThreshold, HarmCategory, Part)\n",
    "\n",
    "model_name = 'gemini-2.0-flash-exp' # This specific model is required\n",
    "vertexai.init(project=PROJECT_ID, location=LOCATION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7RuQjPJ07Klh"
   },
   "source": [
    "### Configure the model\n",
    "\n",
    " The system instructions are mainly used to make the prompts shorter by not having to reapeat each time the format. They are also telling the model how to deal with similar objects which is a nice way to let it be creative."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "lz1yU91P7IfN"
   },
   "outputs": [],
   "source": [
    "bounding_box_system_instructions = \"\"\"\n",
    "Return bounding boxes as a JSON array with labels. Never return masks or code fencing. Limit to 25 objects.\n",
    "If an object is present multiple times, name them according to their unique characteristics (colors, size, position, unique characteristics, etc.)\n",
    "\"\"\"\n",
    "\n",
    "safety_settings= {\n",
    "    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\n",
    "    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,\n",
    "    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\n",
    "    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\n",
    "}\n",
    "\n",
    "# Temperature above 0 is recommended to prevent repeated outputs\n",
    "generation_config = {\n",
    "    'temperature': 0.5,\n",
    "    'max_output_tokens': 8192,\n",
    "    'candidate_count': 1,\n",
    "}\n",
    "\n",
    "model = GenerativeModel(\n",
    "    model_name=model_name,\n",
    "    system_instruction=bounding_box_system_instructions,\n",
    "    generation_config=generation_config,\n",
    "    safety_settings=safety_settings,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n4Vx-ZbZCqWf"
   },
   "source": [
    "### Utils\n",
    "\n",
    "These scripts will be needed to plot the bounding boxes. Of course they are just examples and you are free to use any other libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "oxK0pycZm4AY"
   },
   "outputs": [],
   "source": [
    "# @title Bounding Box Visualization\n",
    "\n",
    "# Get Noto JP font to display japanese characters\n",
    "!apt-get install fonts-noto-cjk  # For Noto Sans CJK JP\n",
    "\n",
    "import json\n",
    "import random\n",
    "import io\n",
    "from PIL import Image, ImageDraw, ImageFont\n",
    "from PIL import ImageColor\n",
    "\n",
    "additional_colors = [colorname for (colorname, colorcode) in ImageColor.colormap.items()]\n",
    "\n",
    "def plot_bounding_boxes(im, bounding_boxes):\n",
    "    \"\"\"\n",
    "    Plots bounding boxes on an image with markers for each a name, using PIL, normalized coordinates, and different colors.\n",
    "\n",
    "    Args:\n",
    "        img_path: The path to the image file.\n",
    "        bounding_boxes: A list of bounding boxes containing the name of the object\n",
    "         and their positions in normalized [y1 x1 y2 x2] format.\n",
    "    \"\"\"\n",
    "\n",
    "    img = im.resize((1024,1024))\n",
    "    width, height = img.size\n",
    "    draw = ImageDraw.Draw(img)\n",
    "\n",
    "    # Define a list of bounding box border colors\n",
    "    colors = [\n",
    "    'red',\n",
    "    'green',\n",
    "    'blue',\n",
    "    'yellow',\n",
    "    'orange',\n",
    "    'pink',\n",
    "    'purple',\n",
    "    'brown',\n",
    "    'gray',\n",
    "    'beige',\n",
    "    'turquoise',\n",
    "    'cyan',\n",
    "    'magenta',\n",
    "    'lime',\n",
    "    'navy',\n",
    "    'maroon',\n",
    "    'teal',\n",
    "    'olive',\n",
    "    'coral',\n",
    "    'lavender',\n",
    "    'violet',\n",
    "    'gold',\n",
    "    'silver',\n",
    "    ] + additional_colors\n",
    "\n",
    "    # We parse out the markdown fencing\n",
    "    bounding_boxes = parse_json(bounding_boxes)\n",
    "\n",
    "    font = ImageFont.truetype(\"NotoSansCJK-Regular.ttc\", size=14)\n",
    "\n",
    "    # Iterate over the bounding boxes\n",
    "    for i, bounding_box in enumerate(json.loads(bounding_boxes)):\n",
    "      # Select a color from the list\n",
    "      color = colors[i % len(colors)]\n",
    "\n",
    "      # Convert normalized coordinates to absolute coordinates\n",
    "      abs_y1 = int(bounding_box[\"box_2d\"][0]/1000 * height)\n",
    "      abs_x1 = int(bounding_box[\"box_2d\"][1]/1000 * width)\n",
    "      abs_y2 = int(bounding_box[\"box_2d\"][2]/1000 * height)\n",
    "      abs_x2 = int(bounding_box[\"box_2d\"][3]/1000 * width)\n",
    "\n",
    "      if abs_x1 > abs_x2:\n",
    "        abs_x1, abs_x2 = abs_x2, abs_x1\n",
    "\n",
    "      if abs_y1 > abs_y2:\n",
    "        abs_y1, abs_y2 = abs_y2, abs_y1\n",
    "\n",
    "      # Draw the bounding box\n",
    "      draw.rectangle(\n",
    "          ((abs_x1, abs_y1), (abs_x2, abs_y2)), outline=color, width=4\n",
    "      )\n",
    "\n",
    "      # Draw the text\n",
    "      if \"label\" in bounding_box:\n",
    "        draw.text((abs_x1 + 8, abs_y1 + 6), bounding_box[\"label\"], fill=color, font=font)\n",
    "\n",
    "    # Display the image\n",
    "    img.show()\n",
    "    return img"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "DV6GB0X_GNYo"
   },
   "outputs": [],
   "source": [
    "# @title Image encoder\n",
    "def encode_image(local_file_path: str) -> Part:\n",
    "    encoded_image = base64.b64encode(open(local_file_path, 'rb').read()).decode('utf-8')\n",
    "    return Part.from_data(data=base64.b64decode(encoded_image), mime_type='image/jpeg')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "cHbhdPsH1PAS"
   },
   "outputs": [],
   "source": [
    "# @title Model output parsing\n",
    "def parse_json(json_output):\n",
    "    # We parse out the markdown fencing\n",
    "    lines = json_output.splitlines()\n",
    "    for i, line in enumerate(lines):\n",
    "        if line == \"```json\":\n",
    "            json_output = \"\\n\".join(lines[i+1:])  # Remove everything before \"```json\"\n",
    "            json_output = json_output.split(\"```\")[0]  # Remove everything after the closing \"```\"\n",
    "            break  # Exit the loop once \"```json\" is found\n",
    "    return json_output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wqPO819ysM3c"
   },
   "source": [
    "### Download sample images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Pb2qWTIqsImv"
   },
   "outputs": [],
   "source": [
    "# Load sample images\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/socks.jpg -O Socks.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/vegetables.jpg -O Vegetables.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/Japanese_Bento.png -O Japanese_bento.png -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/Cupcakes.jpg -O Cupcakes.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/origamis.jpg -O Origamis.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/fruits.jpg -O Fruits.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/cat.jpg -O Cat.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/pumpkins.jpg -O Pumpkins.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/breakfast.jpg -O Breakfast.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/bookshelf.jpg -O Bookshelf.jpg -q\n",
    "!wget https://storage.googleapis.com/generativeai-downloads/images/spill.jpg -O Spill.jpg -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WFLDgSztv77H"
   },
   "source": [
    "## Object detection with Bounding Boxes\n",
    "\n",
    "Let's start by loading an image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eSFBHGJjxIJb"
   },
   "outputs": [],
   "source": [
    "image = \"Cupcakes.jpg\" # @param [\"Socks.jpg\",\"Vegetables.jpg\",\"Japanese_bento.png\",\"Cupcakes.jpg\",\"Origamis.jpg\",\"Fruits.jpg\",\"Cat.jpg\",\"Pumpkins.jpg\",\"Breakfast.jpg\",\"Bookshelf.jpg\", \"Spill.jpg\"] {\"allow-input\":true}\n",
    "Image.open(image).resize((400,400))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wYfXLLgRyDfo"
   },
   "source": [
    "Now we can test a simple prompt to find all the cupcakes in the image.\n",
    "\n",
    "To prevent the model from repeating itself, it is recommended to use a temperature above 0, in this case 0.5. Limiting the number of items (25 in the system instructions) is also a way to prevent the model from looping and to speed up the decoding of the bounding boxes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "phxOEE6Gx_Vy"
   },
   "outputs": [],
   "source": [
    "prompt = \"Detect the 2d bounding boxes of the cupcakes (with “label” as topping description”)\"  # @param {type:\"string\"}\n",
    "\n",
    "# Run inference to find the bounding boxes\n",
    "response = model.generate_content(\n",
    "    contents=[prompt, encode_image(image)]\n",
    ")\n",
    "\n",
    "print(response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wxlObJhF00mR"
   },
   "source": [
    "As you can see, even without any instructions about the format, Gemini is trained to always use this format with a label and the coordinates of the bounding box in a \"box_2d\" array.\n",
    "\n",
    "Please note that Y coordinates preceed X coordinates which is a bit unusual."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "FZBNvc1XzJP5"
   },
   "outputs": [],
   "source": [
    "plot_bounding_boxes(Image.open(image), response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wjP8ktS62QRv"
   },
   "source": [
    "## Search within an image\n",
    "\n",
    "A more nuanced example of finding requested objects and displaying bounding boxes with additional information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "jlF_8H2f2ZEI"
   },
   "outputs": [],
   "source": [
    "image = \"Socks.jpg\" # @param [\"Socks.jpg\",\"Vegetables.jpg\",\"Japanese_bento.png\",\"Cupcakes.jpg\",\"Origamis.jpg\",\"Fruits.jpg\",\"Cat.jpg\",\"Pumpkins.jpg\",\"Breakfast.jpg\",\"Bookshelf.jpg\", \"Spill.jpg\"] {\"allow-input\":true}\n",
    "prompt = \"Find the sock that matches the one at the top and return the bounding box for that sock\"  # @param [\"Detect all rainbow socks\", \"Show me the positions of the socks with the face\",\"Find the sock that matches the one at the top and return the bounding box for that sock\"] {\"allow-input\":true}\n",
    "\n",
    "# Run inference to find bounding boxes\n",
    "response = model.generate_content(\n",
    "    contents=[prompt, encode_image(image)]\n",
    ")\n",
    "\n",
    "# Check output\n",
    "print(response.text)\n",
    "\n",
    "# Generate image with bounding boxes\n",
    "plot_bounding_boxes(Image.open(image), response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-SctVyhDFMOK"
   },
   "source": [
    "Try it with different images and prompts. Different samples are proposed but you can also write your own."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tvVSSr7z3uN4"
   },
   "source": [
    "## Multilingual capabilities\n",
    "\n",
    "As Gemini is able to understand multiple languages, you can combine spatial reasoning with multilingual capabilities.\n",
    "\n",
    "You can give it an image like this and prompt it to label each item with Japanese characters and English translation. The model reads the text and recognize the pictures from the image itself and translates them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NLmbPUTg3wxx"
   },
   "outputs": [],
   "source": [
    "image = \"Japanese_bento.png\" # @param [\"Socks.jpg\",\"Vegetables.jpg\",\"Japanese_bento.png\",\"Cupcakes.jpg\",\"Origamis.jpg\",\"Fruits.jpg\",\"Cat.jpg\",\"Pumpkins.jpg\",\"Breakfast.jpg\",\"Bookshelf.jpg\", \"Spill.jpg\"] {\"allow-input\":true}\n",
    "prompt = \"Explain what those dishes are with a 5 words description\"  # @param [\"Detect food items, with Japanese characters + english translation in \\\"label\\\".\", \"Show me the vegan dishes\",\"Explain what those dishes are with a 5 words description\",\"Find the dishes with allergens and label them accordingly\"] {\"allow-input\":true}\n",
    "\n",
    "# Run inference to find bounding boxes\n",
    "response = model.generate_content(\n",
    "    contents=[prompt, encode_image(image)]\n",
    ")\n",
    "\n",
    "# Generate image with bounding boxes\n",
    "plot_bounding_boxes(Image.open(image), response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Js_j7c5b97Pr"
   },
   "source": [
    "### Summary\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GZbhjYkUA86w"
   },
   "source": [
    "## Reasoning capabilities\n",
    "\n",
    "The model can also reason based on the image. You can ask it about the positions of items, their utility, or, like in this example, to find the shadow of a speficic item."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ACTfRRZf-FUu"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_-IHhnxgBNdD"
   },
   "outputs": [],
   "source": [
    "image = \"Origamis.jpg\" # @param [\"Socks.jpg\",\"Vegetables.jpg\",\"Japanese_bento.png\",\"Cupcakes.jpg\",\"Origamis.jpg\",\"Fruits.jpg\",\"Cat.jpg\",\"Pumpkins.jpg\",\"Breakfast.jpg\",\"Bookshelf.jpg\", \"Spill.jpg\"] {\"allow-input\":true}\n",
    "prompt = \"Draw a square around the fox' shadow\"  # @param [\"Find the two origami animals.\", \"Where are the origamis' shadows?\",\"Draw a square around the fox' shadow\"] {\"allow-input\":true}\n",
    "\n",
    "# Run inference to find bounding boxes\n",
    "response = model.generate_content(\n",
    "    contents=[prompt, encode_image(image)]\n",
    ")\n",
    "\n",
    "# Generate image with bounding boxes\n",
    "plot_bounding_boxes(Image.open(image), response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tQszSykaTBHo"
   },
   "source": [
    "If you check the previous examples, the [Japanese food](#scrollTo=tvVSSr7z3uN4) one in particular,multiple other prompt samples are provided to experiment with Gemini reasoning capabilities."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wI5c6Qe_-Hf5"
   },
   "source": [
    "### Summary\n",
    "\n",
    "This notebook demonstrates a few ways to leverage Gemini 2.0's spacial reasoning for various tasks, including object detection, spatial reasoning, and multilingual capabilities."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
